System and method for voice authentication

ABSTRACT

Embodiments of the invention provide for secure voice authentication through a communication device or access device. Certain embodiments allow for providing a word string to a communication device or authentication device. The communication or authentication device plays a supplemental signal that is unique to a transaction. The communication device or authentication device concurrently records an audio segment originating from the user and the supplemental signal. The audio segment is an attempt by the user to vocally reproduce the word string. The communication device or authentication device sends the concurrently recorded audio segment and supplemental signal, to a computer, where the computer authenticates the user.

CROSS-REFERENCES TO RELATED APPLICATIONS

The present application is a non-provisional application of and claimspriority to U.S. Provisional Application No. 61/739,464, filed on Dec.19, 2012, the entire contents of which are herein incorporated byreference for all purposes.

BACKGROUND

A number of instances have occurred where a cardholder may wish to makea purchase with their communication device or initiate some other typeof e-commerce transaction from their communication device. However,these types of purchases/transactions inherently carry a high riskbecause the cardholder does not use their physical payment card tocomplete the purchase/transaction. As such, many card issuers ormerchants may wish to have an extra level of security to verify theidentity of the cardholder wishing to complete the purchase/transactionwith their communication device. Current solutions have attempted to usevoice authentication to verify the identity of the cardholder. However,these solutions are vulnerable to replay attacks because the user'svoice could easily be recorded and replayed by fraudsters.

Embodiments of the invention address this and other problems, bothindividually and collectively.

SUMMARY

Embodiments of the invention broadly described, allow for user voiceauthentication. More specifically, the invention pertains totransactions initiated from a communication device, such as a mobilephone or personal computer, for both face-to-face and remote paymentenvironments.

Embodiments of the present invention relate to systems and methods forperforming voice authentication for a user at a communication device oraccess device. A user may initiate a financial transaction from his/hercommunication device or at a access device, such as a payment terminal.The user may then be asked to provide a voice sample by speaking acertain word or phrase. Concurrent to the user speaking the word orphrase, the communication device or payment terminal may emit asupplemental signal, such as an inaudible sound, received from thepayment processor, via the speaker of the communication device orpayment terminal. The microphone of the communication device or paymentterminal may capture the supplemental signal along with the spokenword/phrase and send the captured voice sample and supplemental signalalong with transaction information to the acquirer who forwards it tothe payment processor. The payment processor may verify that thesupplemental signal received is the same as the one initially sent tothe communication device. If the payment processor determines a matchbetween the received inaudible sound and the initially sent supplementalsignal, the user may be authenticated accordingly.

One embodiment of the invention is directed to a method forauthenticating a user for a transaction comprising providing, by adevice, a word string. The method includes playing, by the device, asupplemental signal unique to the transaction. The method furtherincludes concurrently recording, by the device, an audio segmentoriginating from the user and the supplemental signal, wherein the audiosegment is an attempt by the user to vocally reproduce the word string.The method additionally includes, sending, by the device theconcurrently recorded audio segment and supplemental signal, to acomputer, wherein the computer authenticates the user.

Another embodiment of the invention is directed to a device comprising aprocessor, and a computer readable medium coupled to the processor. Thecomputer readable medium comprises code, executable by the processor,for implementing the above-described method.

Another embodiment of the invention is directed to a method forauthenticating a user for a transaction including receiving, by a servercomputer, an audio segment and supplemental signal unique to thetransaction, wherein the audio segment and supplemental signal wereconcurrently recorded by a device and the audio segment originates fromthe user and is an attempt by the user to vocally reproduce a wordstring provided by the device. The method also includes verifying, bythe server computer, that the supplemental signal received in thepayment authorization request matches the supplemental signal providedto the device.

Another embodiment of the invention is directed to a device comprising aprocessor, and a computer readable medium coupled to the processor. Thecomputer readable medium comprises code, executable by the processor,for implementing the above-described method.

It can be appreciated that while the discussion herein describesexamples using a payment card and a cardholder, the payment card may begenerically referred to as any payment instrument and the cardholder maybe generically referred to as a user in other embodiments (where a cardis not present).

These and other embodiments of the invention are described in furtherdetail below.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a payment system, according to anembodiment of the present invention.

FIG. 2 is a block diagram of a communication device, according to anembodiment of the present invention.

FIG. 3 is a block diagram of a server computer, according to anembodiment of the present invention.

FIG. 4 is a flow diagram illustrating a method for authenticating a userfor a transaction via a communication device, according to an embodimentof the present invention.

FIG. 5 is a flow diagram illustrating a method for authenticating a user401 for a transaction via an access device 120, according to anembodiment of the present invention.

FIG. 6A shows a screenshot of initial voice authentication enrollment ona communication device 110, according to an embodiment of the presentinvention.

FIG. 6B shows a screenshot of capturing voice data for voiceauthentication enrollment, according to an embodiment of the presentinvention.

FIG. 6C shows a screenshot of progressive feedback while capturing uservoice data for voice authentication enrollment, according to anembodiment of the present invention.

FIG. 7A shows a screenshot of voice authentication on a communicationdevice 110 using a first word string 710, according to an embodiment ofthe present invention.

FIG. 7B shows a screenshot of speaker verification on a communicationdevice using a second word string 720, according to an embodiment of thepresent invention.

FIG. 8 illustrates contents of a supplemental signal database 340,according to an embodiment of the present invention.

FIG. 9 is a flow diagram illustrating a method 900 for authenticating auser for a transaction, according to an embodiment of the presentinvention.

FIG. 10 is a diagram of a computer apparatus 1000, according to anexample embodiment.

DETAILED DESCRIPTION

Prior to discussing the specific embodiments of the invention, a furtherdescription of some terms can be provided for a better understanding ofembodiments of the invention.

A “payment device” may include any suitable device capable of making apayment. For example, a payment device can include a card including acredit card, debit card, charge card, gift card, or any combinationthereof. A payment device can be used in conjunction with acommunication device, as further defined below.

A “payment processing network” (e.g., VisaNet™) may include dataprocessing subsystems, networks, and operations used to support anddeliver authorization services, exception file services, and clearingand settlement services. An exemplary payment processing network mayinclude VisaNet™. Payment processing networks such as VisaNet™ are ableto process credit card transactions, debit card transactions, and othertypes of commercial transactions. VisaNet™ in particular, includes a VIPsystem (Visa Integrated Payments system) which processes authorizationrequests and a Base II system which performs clearing and settlementservices.

A “server computer” can be a powerful computer or a cluster ofcomputers. For example, the server computer can be a large mainframe, aminicomputer cluster, or a group of servers functioning as a unit. Inone example, the server computer may be a database server coupled to aWeb server.

An “access device” (e.g. a point-of-service (POS) terminal) can be anysuitable device configured to process payment transactions such ascredit card or debit card transactions, or electronic settlementtransactions, and may have optical, electrical, or magnetic readers forreading data from other portable communication devices such as smartcards, keychain device, cell phones, payment cards, security cards,access cards, and the like.

An “acquirer” is a business entity (e.g., a commercial bank) thattypically has a business relationship with the merchant and receivessome or all of the transactions from that merchant.

An “issuer” is a business entity which issues a card to a user.Typically, an issuer is a financial institution.

A “cardholder” is a type of user that is authorized to use a paymentcard issued by the issuer. The terms “cardholder” and “user” may be usedinterchangeably in the following description. A “user” and/or“cardholder” may be any competent individual.

“Speaker recognition” is the identification of a user who is speakingbased on characteristics of their voice (voice biometrics). Speakerrecognition uses the acoustic features of speech that have been found todiffer between individuals. These acoustic patterns reflect both anatomy(e.g., size and shape of the throat and mouth) and learned behavioralpatterns (e.g., voice pitch, speaking style).

“Speech recognition” is the translation of spoken words into textunderstandable by a computer system. Speech recognition combined withspeaker recognition may simplify the task of translating speech insystems that are used to authenticate or verify the identity of aspeaker as part of a security process.

“Voice recognition” may be used to describe both “speaker recognition”and “speech recognition”.

A “voice profile,” as described herein, can be a profile or modelrepresenting a risk factor associated with a user. The voice profile maycontain information about current and prior user authentications with averification system. For example, the voice profile may contain thetime, location, voice data, and match score associated with eachparticular voice authentication with the verification system by theuser. The combination of information within the voice profile aboutprior authentications may be used to determine the risk factorassociated with the user.

A “word string,” as described herein, can be a combination of a numberof words arranged in a particular order. A user may be requested torepeat a prompt for authentication purposes. The terms “prompt” and“word string” may be used interchangeably in the following description.

“Voice data” or a “voice sample,” as described herein, can be captureddigital audio data of a user's voice. For example, a voice sample may bea captured digital data signal of a user who wishes to authenticate witha transaction system. The user may be requested to repeat a certainprompt. The microphone may capture the prompt repeated by the user andpass the audio data to another module for speaker verification. Theterms “voice sample,” “voice data,” and “audio segment” may be usedinterchangeably in the following description.

A “match score,” as described herein, can be a relationship betweenreceived input data and stored data. In the context of the presentinvention, the received input data can be a captured voice sample. Thestored data can be a previously captured and stored voice sample. Thematch score may express the degree of confidence between the receivedinput data and the stored data. The match score may be passed to otherparts of a risk scoring mechanism, such that the match score contributesalong with other risk parameters to an overall decision, for example,approving or declining a transaction. Setting an appropriate thresholdto ensure an acceptable level of accuracy would be appreciated by one ofordinary skill in the art. This concept can be applied to otherbiometric data apart from voice samples (e.g., retinal scans, facialrecognition data, etc.).

A “supplemental signal,” as described here, can be any signal that maybe output by the communication device or access terminal concurrent tothe user performing voice authentication. The supplemental signal mayinclude an inaudible sound or an audible sound, or some other signalthat may be played back using a microphone of the device.

A “communication device,” as described herein, can be any electroniccommunication device that can execute and/or support electroniccommunications including, but not limited to, payment transactions. Someexamples include a personal digital assistant (PDA), a smart phone,tablet computer, notebook computer, and the like.

An “authorization request message” may be an electronic message that issent to a payment processing network and/or an issuer of a payment cardto request authorization for a transaction. An authorization requestmessage according to some embodiments may comply with (InternationalOrganization of Standardization) ISO 8583, which is a standard forsystems that exchange electronic transaction information associated witha payment made by a consumer using a payment device or payment account.The authorization request message may include an issuer accountidentifier that may be associated with a payment device or paymentaccount. An authorization request message may also comprise additionaldata elements corresponding to “identification information” including,by way of example only: a service code, a CVV (card verification value),a dCVV (dynamic card verification value), an expiration date, etc. Anauthorization request message may also comprise “transactioninformation,” such as any information associated with a currenttransaction, such as the transaction amount, merchant identifier,merchant location, etc., as well as any other information that may beutilized in determining whether to identify and/or authorize atransaction.

An “authorization response message” may be an electronic message replyto an authorization request message generated by an issuing financialinstitution or a payment processing network. The authorization responsemessage may include, by way of example only, one or more of thefollowing status indicators: Approval—transaction was approved;Decline—transaction was not approved; or Call Center—response pendingmore information, merchant must call the toll-free authorization phonenumber. The authorization response message may also include anauthorization code, which may be a code that a credit card issuing bankreturns in response to an authorization request message in an electronicmessage (either directly or through the payment processing network) tothe merchant's access device (e.g. POS equipment) that indicatesapproval of the transaction. The code may serve as proof ofauthorization. As noted above, in some embodiments, a payment processingnetwork may generate or forward the authorization response message tothe merchant.

As used herein, a “communications channel” may refer to any suitablepath for communication between two or more entities. Suitablecommunications channels may be present directly between two entitiessuch as a payment processing network and a merchant or issuer computer,or may include a number of different entities. Any suitablecommunications protocols may be used for generating a communicationschannel. A communication channel may in some instance comprise a “securecommunication channel,” which may be established in any known manner,including the use of mutual authentication and a session key andestablishment of a secure socket layer (SSL) session. However, anymethod of creating a secure channel may be used. By establishing asecure channel, sensitive information related to a payment device (suchas account numbers, CVV values, expiration dates, etc.) may be securelytransmitted between the two or more entities to facilitate atransaction.

I. Exemplary Systems

FIG. 1 is a block diagram of a payment system 100, according to oneembodiment of the present invention. The system 100 includes a paymentdevice 105, a communication device 110, an access device 120, a merchant125, an acquirer 130, a payment processing network 140, an issuer 150,and an interconnected network 160. The acquirer 130 may further includean acquirer computer (not shown). The payment processing network 140 mayinclude an authorization and settlement server and/or additional servers(not shown) to carry out the various transactions described herein.

In an embodiment, the communication device 110 is in electroniccommunication with the access device 120. The communication device 110can be a personal digital assistant (PDA), a smart phone, tabletcomputer, notebook computer, or the like, that can execute and/orsupport payment transactions with a payment system 100. A communicationdevice 110 can be used in conjunction with a payment card 105, such as acredit card, debit card, charge card, gift card, or other payment deviceand/or any combination thereof. The combination of a payment card 105(e.g., credit card) and the communication device 110 (e.g., smart phone)can be referred to as the communication device 110 for illustrativepurposes. In other embodiments, the communication device 110 may be usedin conjunction with transactions of currency or points (e.g., pointsaccumulated in a particular software application). In furtherembodiments, the communication device 110 may be a wireless device, acontactless device, a magnetic device, or other type of payment devicethat would be known and appreciated by one of ordinary skill in the artwith the benefit of this disclosure. In some embodiments, thecommunication device 110 includes software (e.g., application) and/orhardware to perform the various payment transactions and capture uservoice data as further described below.

The access device 120 is configured to be in electronic communicationwith the acquirer 130 via a merchant 125. In one embodiment, the accessdevice 120 is a point-of-service (POS) device. Alternatively, the accessdevice 120 can be any suitable device configured to process paymenttransactions such as credit card or debit card transactions, orelectronic settlement transactions, and may have optical, electrical, ormagnetic readers for reading data from portable electronic communicationdevices such as smart cards, keychain device, cell phones, paymentcards, security cards, access cards, and the like. In some embodiments,the access device 120 is located at and controlled by a merchant. Forexample, the access device 120 can be a POS device at a grocery storecheckout line. In other embodiments, the terminal could be a clientcomputer or a mobile phone in the event that the user is conducting aremote transaction.

The acquirer 130 (e.g., acquirer bank) includes an acquirer computer(not shown). The acquirer computer can be configured to transfer data(e.g., bank identification number (BIN), etc.) and financial informationto the payment processing network 140. In some embodiments, the acquirer130 does not need to be present in the system 100 for the communicationdevice 110 to transfer the financial and user data to the paymentprocessing network 140. In one non-limiting example, the acquiring bank130 can additionally check the credentials of the user against a watchlist in order to prevent fraud and money laundering schemes, as would beappreciated by one of ordinary skill in the art.

In one embodiment, the payment processing network 140 is VisaNet™, whereVisa internal processing (VIP) performs the various payment processingnetwork 140 or multi-lateral switch functions described herein. Thepayment processing network 140 can include an authorization andsettlement server (not shown). The authorization and settlement server(“authorization server”) performs payment authorization functions. Theauthorization server is further configured to send and receiveauthorization data to the issuer 150. Furthermore, the paymentprocessing network 140 can receive a voice sample by the user (e.g.,from the payment device 110, access device 120, or acquirer 130) todetermine a risk factor associated with a transaction, as furtherdescribed below.

In some embodiments, the issuer 150 is a business entity which issues acard to a card holder. Typically, an issuer is a financial institution.The issuer 150 is configured to receive the authorization data from thepayment processing network 140 (e.g., the authorization server). Theissuer 150 receives authentication data from the authorization serverand determines if the user is authorized to perform a given financialtransaction (e.g., cash deposit/withdrawal, money transfer, balanceinquiry) based on whether the user was authenticated by anidentification system.

In some embodiments, the communication device 110 may be connected toand communicate with the payment processing network 140 via aninterconnected network 160. One example of an interconnected network 160is the Internet. The payment processing network 140 may inform thecommunication device 110 when a payment has been successfully processed.In some embodiments, the payment processing network 140 may be connectedto and communicate with the access device 120 via the interconnectednetwork 160. The payment processing network 140 may inform the accessdevice 120 when a payment has been successfully processed which in turnthe access device 120 may complete the transaction with thecommunication device 110.

A server computer 300 is also shown in FIG. 1, and is in operativecommunication with the interconnected network 160. Details regarding theserver computer 300 are provided below.

The interconnected network 160 may comprise one or more of a local areanetwork, a wide area network, a metropolitan area network (MAN), anintranet, the Internet, a Public Land Mobile Network (PLMN), a telephonenetwork, such as the Public Switched Telephone Network (PSTN) or acellular telephone network (e.g., wireless Global System for MobileCommunications (GSM), wireless Code Division Multiple Access (CDMA),etc.), a VoIP network with mobile and/or fixed locations, a wirelinenetwork, or a combination of networks.

In a typical payment transaction in embodiments of the invention, a usermay interact with the access device 120 (e.g., with a payment devicesuch as a payment card, or by entering payment information) to conduct atransaction with the merchant 125. The merchant 125 may be operate amerchant computer, which may route an authorization request message tothe acquirer 130, and eventually to the issuer 150 via the paymentprocessing network 140.

The issuer 140 will then determine if the transaction is authorized(e.g., by checking for fraud and/or sufficient funds or credit). Theissuer will then transmit an authorization response message to theaccess device 120 via the payment processing network 140 and theacquirer 130.

At the end of the day, the transaction is cleared and settled betweenthe acquirer 130 and the issuer 150 by the payment processing network140.

The description below provides descriptions of other components in thesystem as well as authentication methods using voice samples. Theauthentication methods can be performed at any suitable point during theabove-described transaction flow. For example, the voice authenticationmethod may be performed before or after the user uses a payment deviceto interact with the access device 120. If it is afterwards, then theauthentication method may be performed when the authorization requestmessage is received by the payment processing network 140 or the issuer150.

FIG. 2 is a block diagram of a communication device 110, according to anembodiment of the present invention. Communication device 110 includes aprocessor 210, a microphone 220, a display 230, an input device 240, aspeaker 250, a memory 260, and a computer-readable medium 270.

Processor 210 may be any general-purpose processor operable to carry outinstructions on the communication device 110. The processor 210 iscoupled to other units of the communication device 110 including display230, input device 240, speaker 250, memory 260, and computer-readablemedium 270.

Microphone 220 may be any device that converts sound to an electricsignal. In some embodiments, microphone 220 may be used to capture voicedata from a user.

Display 230 may be any device that displays information to a user.Examples may include an LCD screen, CRT monitor, or seven-segmentdisplay.

Input device 240 may be any device that accepts input from a user.Examples may include a keyboard, keypad, or mouse. In some embodiments,microphone 220 may be considered an input device 240.

Speaker 250 may be any device that outputs sound to a user. Examples mayinclude a built-in speaker or any other device that produces sound inresponse to an electrical audio signal. In some embodiments, speaker 250may be used to output a supplemental signal to the user during audioauthentication.

Memory 260 may be any magnetic, electronic, or optical memory. Memory260 includes two memory modules, module 1 262 and module 2 264. It canbe appreciated that memory 260 may include any number of memory modules.An example of memory 260 may be dynamic random access memory (DRAM).

Computer-readable medium 270 may be any magnetic, electronic, optical,or other computer-readable storage medium. Computer-readable storagemedium 270 includes audio segment capture module 272, and audio segmenttransmission module 274. Computer-readable storage medium 270 maycomprise any combination of volatile and/or non-volatile memory such as,for example, buffer memory, RAM, DRAM, ROM, flash, or any other suitablememory device, alone or in combination with other data storage devices.

Audio segment capture module 272 is configured to capture one or moreaudio segments, via microphone 220, by a user for voice authenticationpurposes. In some embodiments, audio segment capture module 272 maycapture voice data by the user for purposes of initially registering auser, for subsequent voice authentication, for the first time. In someembodiments, audio segment capture module 272 may capture voice data,via microphone 220, for purposes of authenticating a user in order tocomplete a transaction. For example, communication device 110 mayrequest a user to register or authenticate his/her voice data bydisplaying a prompt, on display 230, to repeat (by speaking intomicrophone 220) a specific word string. During subsequent voiceauthentication, the audio segment capture module 272 may capture theuser's attempt to speak a predetermined word string provided to the userby the communication device 110. In some embodiments, the word stringmay be provided to the communication device 110 by a server computer.Upon capturing the user's voice data via microphone 220, the audiosegment corresponding to the prompted word string may be transmitted toa server computer via audio segment transmission module 274 for purposesof storing the audio segment for subsequent user authentication,described below. In some embodiments the audio segment may also includea supplemental signal, described below.

Audio segment transmission module 274 is configured to transmit capturedaudio segments to a server computer. In some embodiments, the capturedaudio segments may be voice data captured during user registrationand/or authentication by audio segment capture module 272, describedabove. In some embodiments, the captured audio segment may be voice datacaptured during subsequent authentication using voice data by the user,described in further detail below. In yet other embodiments, thecaptured audio segment may include the voice data by the user and asupplemental signal, as descried below.

Supplemental signal playback module 276 is configured to play asupplemental signal, via speaker 250, during user voice authentication.In some embodiments, the supplemental signal may be an inaudible soundor an audible sound. The supplemental signal may be generated by aserver computer and received by the communication device 110 over aninput/output interface. In other embodiments, the supplemental signalmay be generated locally on the communication device 110. Thesupplemental signal playback module 276 may play the supplementalsignal, via speaker 250, while the user attempts to reproduce the wordstring. As a result, the audio segment capture module 272 may capture,via microphone 220, the supplemental signal simultaneously with theuser's voice data. The user's voice data and the supplemental signal maybe collectively referred to as the audio segment. While in otherembodiments, the term audio segment may relate only to the user's voicedata.

FIG. 3 is a block diagram of a server computer 300, according to anembodiment of the present invention. Server computer 300 includes aninput/output interface 310, a memory 320, a processor 330, asupplemental signal database 340, a user voice profile database 350, anda computer-readable medium 360. In some embodiments, the server computermay reside within the interconnected network 160.

The input/output (I/O) interface 310 is configured to receive andtransmit data. For example, the I/O interface 310 may receive the audiosegment from the communication device 110 (FIG. 1), via the audiosegment transmission module 274 (FIG. 1). Upon processing and verifyingthe authenticity of the audio segment, the I/O interface 310 mayindicate to the access device 120 (FIG. 1) and/or communication device110 (FIG. 1) that a payment transaction may proceed. The I/O interface310 may also be used for direct interaction with the server computer.The I/O interface 310 may accept input from an input device such as, butnot limited to, a keyboard, keypad, or mouse. Further, the I/O interfacemay display output on a display device.

Memory 320 may be any magnetic, electronic, or optical memory. It can beappreciated that memory 320 may include any number of memory modules,that may comprise any suitable volatile or non-volatile memory devices.An example of memory 320 may be dynamic random access memory (DRAM).

Processor 330 may be any general-purpose processor operable to carry outinstructions on the server computer 300. The processor 330 is coupled toother units of the server computer 300 including input/output interface310, memory 320, supplemental signal database 340, user fraud profiledata base 350, and computer-readable medium 360.

Supplemental signal database 340 is configured to store a variety ofsupplemental signals played back by the communication device 110(FIG. 1) simultaneous to the user performing voice authentication. Insome embodiments, the supplemental signals may be inaudible or audiblesounds. While FIG. 3 shows the supplemental signal database 340 residingwithin the server computer 300, in some embodiments, the supplementalsignal database 340 may reside external to and be communicativelycoupled with the server computer 300. The supplemental signals stored inthe supplemental signal database 340 may be digitally stored. Forexample, the supplemental signals may be stored in a digital binaryformat within the supplemental signal database 340. The supplementalsignal database 340 may be updated to add, change, or remove anysupplemental signals stored. These attributes of the supplemental signaldatabase are described in detail in FIG. 8.

The user voice profile database 350 is configured to store a voiceprofile of a payment user. The voice profile of a payment user mayinclude attributes such as, but not limited to, initiation time of thepayment transaction, the payment cardholder's name, the voice dataassociated with the payment transaction, the outcome of paymentcardholder verification/authentication, and a match score for the audiodata. Each time the payment user performs voice authentication via thecommunication device 1110 (FIG. 1), the user voice profile database 350may be updated based on the audio segment received by the servercomputer 300 and details of the payment transaction. Additionally, theuser voice profile database 350 may store user voice profiles for aplurality of payment users.

Computer-readable medium 360 may be any magnetic, electronic, optical,or other computer-readable storage medium. Computer-readable storagemedium 360 includes word string generation module 362, supplementalsignal transmission module 364, word string reproduction determinationmodule 366, and supplemental signal match determination module 368.Computer-readable storage medium 360 may comprise any combination ofvolatile and/or non-volatile memory such as, for example, buffer memory,RAM, DRAM, ROM, flash, or any other suitable memory device, alone or incombination with other data storage devices.

Word string generation module 362 is configured to generate a wordstring intended to be spoken by the user for registration and/orauthentication purposes. Word string generation module 362 may generatethe random word string and transmit it to communication device 110(FIG. 1) via I/O interface 310 so that communication device 110 (FIG. 1)may display the randomly generated word string to the user via display230 (FIG. 1). Word string generation module 362 may generate wordstrings from a set of possible word strings large enough such that itmay be highly unlikely that an individual user may be prompted more thanonce for the same set of words or word strings. In some embodiments, therandom word strings generated by word string generation module 362 maybe relatively short in length. In some embodiments, the word stringgeneration module 362 may generate a single random word in combinationwith a fixed word string.

Supplemental signal transmission module 364 is configured to transmit asupplemental signal from the supplemental signal database 340 to thecommunication device 110 (FIG. 1) via the input/output interface 310. Asdescribed above, the supplemental signal database 340 stores thesupplemental signals to be played back by the communication device 110(FIG. 1) simultaneous to the user performing voice authentication. Insome embodiments, the supplemental signal transmission module 364 mayalter and/or manipulate the supplemental signal prior to transmittingthe supplemental signal to the communication device 110 (FIG. 1). Forexample, the supplemental signal transmission module may change afrequency, pitch, speed, etc. of the supplemental signal prior totransmission.

Word string reproduction determination module 366 is configured todetermine whether the captured audio data from the user is an accuratereproduction of the word string generated by word string generationmodule 362. In some embodiments, word string reproduction determinationmodule 368 may include speech recognition technology operable fordetermining whether the captured audio data matches the words/promptsthat were prompted for/generated by word string generation module 362.

Supplemental signal match determination module 368 is configured tocalculate a match score associated with the supplemental signal receivedby the server computer 300, as part of the audio segment. As describedabove, the communication device 100 may play back a supplemental signalwhile the user is performing voice authentication. As the user performsthe voice authentication by speaking a reproduction of a prompted wordstring, the supplemental signal being played back by the communicationdevice 110 (FIG. 1) is recorded via the microphone 220 (FIG. 2) alongwith the user's voice reproduction of the word string, collectivelyknown as the audio segment. The audio segment may then be sent by thecommunication device 110 (FIG. 1) to the server computer 300. Thesupplemental signal match determination module 368 may determine whetherthe received supplemental signal within the received audio segmentmatches the supplemental signal initially sent to the communicationdevice 110 (FIG. 1) by the server computer 300. The determination may bebased on a match score (typically between 0 and 100), where the scoreexpresses a degree of confidence that the received supplemental signalmatches with the initially sent supplemental signal and ultimately thatthe user attempting to authenticate is the genuine user and not afraudster attempting a replay attack. This score can be passed on toother parts of a risk scoring mechanism, such that the score, along withother risk parameters, contributes to the overall decision of approvingor declining the transaction.

It can be appreciated that in some embodiments the server computer 300may reside within the payment processing network 140 (FIG. 1).

FIG. 4 is a flow diagram illustrating a method for authenticating a user401 for a transaction via a communication device 110, according to anembodiment of the present invention. The voice authentication system 400includes a communication device 110 (e.g., mobile phone), acquirer 130,payment processing network 140, issuer 150, and a server computer 300.It can be appreciated that while the server computer 300 is shownexternal to the payment processing network 140, in some embodiments theserver computer 300 may reside within the payment processing network140. The voice authentication system 400 provides a mechanism toauthenticate a user 401 using voice authentication techniques,specifically by playing back and recording a supplemental signalsubstantially simultaneous to the user performing voice authentication.

The user 401 may use his/her communication device 110 to initiate afinancial transaction. In one embodiment, the user may activate anapplication on the communication device 110 or dial a specific number inorder to initiate the financial transaction. In other embodiments, theuser may use the communication device 110 that is equipped withnear-field communication technology (NFC) to initiate a financialtransaction at an access device 120 (FIG. 1). The access device 120(FIG. 1) may be located at a merchant site and the user maycommunicatively couple the communication device 110 to the access device120 (FIG. 1) to initiate the financial transaction. In one embodiment,the financial transaction may be initiated in order to pay for an itemor service, from the merchant, purchased by the user 401. Once thefinancial transaction is initiated, the user may be asked to provide avoice sample by speaking a certain word or phrase into the microphone ofcommunication device 110 (step 402). For example, the voiceauthentication system 400 may prompt the user 401, via display 230 (FIG.2) of communication device 110, to speak a random word string, such as“The quick brown fox jumps over the lazy dog”, or the user 401 maynavigate the application using voice which may be captured and used. Insome embodiments of the invention, the prompted word string is less thanabout 7 words in length, and preferably five or less words in length. Bykeeping the lengths of the prompts short, users are less frustrated andare more likely to use the systems and methods according to embodimentsof the invention.

In another example, the voice authentication system 400 may prompt theuser 401 with a prompt having a variable or random element. In someembodiments, the prompt may have both a random element and a fixedelement, with the fixed element being greater than the random element.In some embodiments, the fixed element can have a length of 7, 5, 3, or2 words or less, while the random element may have a length of 5, 3, or2 words or less. For example, embodiments of the invention may provide afirst word string prompt such as “Please repeat the word TREE LEAVES”and subsequently a second prompt such as “Please repeat the words CATAND DOG”. The phrase “Please repeat the words” may be a fixed portion ofthe prompt, while words “TREE LEAVES” and “CAT AND DOG” may be random orvariable portions of the prompt.

In some embodiments, the request to the user 401 to speak the prompt maybe displayed on the communication device 110. The voice authenticationsystem 400 is described in further detail below.

Prior to the user 401 actually speaking the prompted word string, thecommunication device 110 may receive a supplemental signal from theserver computer 300 (step 404). In some embodiments, the supplementalsignal may include an audible or inaudible sound. As described above,the supplemental signal may be retrieved from the supplemental signaldatabase 340 and transmitted to the communication device 100 via thesupplemental signal transmission module 364 (FIG. 3) (step 403). In someembodiments, the supplemental signal transmission module 364 (FIG. 3)may alter and/or manipulate the supplemental signal prior totransmitting the supplemental signal to the communication device 110.For example, the supplemental signal transmission module 364 (FIG. 3)may change a frequency, pitch, speed, etc. of the supplemental signalprior to transmission. It can be appreciated that the supplementalsignal transmitted by the communication device 110 may be unique to theparticular transaction. Further, by altering and/or manipulating thesupplemental signal prior to transmitting, it may be ensured that thesame supplemental signal is never used for more than one transaction.

The user may then speak the prompted word string into the microphone ofthe communication device 110 (step 406). For example, the user may speak“The quick brown fox jumps over the lazy dog”, similar to the example ofthe prompted word string provided above. Concurrent or substantiallysimultaneous to the user 401 speaking the prompted word string, thecommunication device 110 can emit the supplemental signal from thespeaker of communication device 110 (step 408). For example, thecommunication device 110 may emit an inaudible sound that cannot beheard by a human, such as sound having frequency of less the 20 Hz ormore than 20,000 Hz. The microphone of the communication device 110 canconcurrently or substantially simultaneous capture this inaudible soundalong with the spoken prompted word string by the user 401 (step 410).That is, while the speaker 250 (FIG. 2) of the communication device 110is playing the supplemental signal (e.g., the inaudible sound), themicrophone 220 (FIG. 2) of the communication device 110 may concurrentlyor substantially simultaneous capture both the supplemental signal andthe speaker's attempt to reproduce to the prompted word string.

The communication device 110 may send the captured user voice data andsupplemental signal (collectively referred to as the audio segment)along with any transaction information to an acquirer 130 (step 412). Insome embodiments, the communication device 110 may send the audiosegment and transaction information to the acquirer 130 via an accessdevice (not shown). Acquirer 130 may then forward the audio segment andtransaction information to the payment processing network 140 (step414). In some embodiments, the audio segment received by paymentprocessing network 140 from communication device 110 may be included inthe payment authorization request message described herein.

The payment processing network 140 may then authenticate the user 401based on the received audio segment. The payment processing network 140may establish a communication with the server computer 300 in order toauthenticate the user 401 (step 416). In some embodiments, the servercomputer 300 may reside within the payment processing network 140. In aparticular embodiment, the payment processing network 140 may consultwith the server computer 300 using the received audio segment (includingthe user voice data and the supplemental signal). The server computer300 may determine whether the supplemental signal initially sent by theserver computer 300 to the communication device 110 (in step 404)matches the supplemental signal received from the payment processingnetwork 140 (step 420). Additionally, the server computer 300 maydetermine whether the user voice data received in the audio segmentmatches a voice profile of the user 401 stored in the user voice profiledatabase 350 (step 418). The server computer 300 may parse and analyzethe user voice data in order to make this determination. If the servercomputer 300 determines a match of the user voice data, the user voicedata may be stored in the user voice profiles database 350 and it may beupdated with the currently received user voice data and used for futureevaluation of voice data from the user 401.

In some embodiments, the server computer 300 provides a pass/fail scorethat can be used as what is known as a “Cardholder VerificationMechanism” (CVM) in certain standards, alongside other CVMs such as aPersonal Identification Number (PIN) or signature. In this mode, anaccess device (e.g., at merchant) or the communication device 110 willbe informed as to whether voice authentication has passed or failed, andcan make a decision whether to proceed with the transaction. Thedetermination of the pass/fail score may be made by the server computer300 upon receiving the voice data and supplemental signal in the audiosegment. In some embodiments, the pass/fail response may be based onwhether the user 401 voice data is consistent with previous user voicedata stored in the user voice profile database 350 and whether thesupplemental signal received in the audio segment matches thesupplemental signal originally sent by the server computer 300 to thecommunication device 110. If the above conditions are true, the responsemay indicate that authentication has passed.

In some embodiments, the server computer 300 provides a score (typicallybetween 0 and 100), where the score expresses a degree of confidencethat a user 401 is the genuine user. This score can be passed on toother parts of a risk scoring mechanism (one run by payment processor,by the issuer, or by a third party), such that the score, along withother risk parameters, contributes to the overall decision of approvingor declining the transaction. In some embodiments, the match score isbased on how closely the captured user voice data matches to previouslycaptured user voice data stored in the user voice profile database 350.That is, how closely the current voice data matches to previouslyobtained voice data from the user. This may be determined by analyzingfeatures of the voice sample such as, but not limited to, tone, pitch,etc. Additionally, the score may also express a degree of confidencethat the supplemental signal received in the audio segment matches thesupplemental signal originally sent by the server computer 300 to thecommunication device 110.

In another embodiment, communication device 110 may have a repository ofsupplemental signals stored in its memory. When communication device 110is registered with the payment processing network 140 and/or issuer 150,it may be provided with numerous supplemental signals that can be storedby communication device 110. These supplemental signals can beassociated with the user 401 account. In this instance, when the user401 speaks the prompted word string, communication device 110 can selectone of the stored supplemental signals and output it via its speaker 250(FIG. 2) to be captured by the microphone 220 (FIG. 2) along with theuser's 401 spoken word or phrase. In this instance, the voiceauthentication can occur by comparing the captured supplemental signalwith the supplemental signal pre-associated with the user account andchecking whether the captured supplemental signal matches one of thesupplemental signals associated with the user account.

Once the payment processing network 140 authenticates the user based onthe received audio segment and other financial information, it cancommunicate with issuer 150 to complete the transaction (step 422).

In another embodiment, communication device 110 may directly communicatewith server computer 300 to provide both the user voice data and thesupplemental signal. In this instance, server computer 300 may work inconjunction with payment processing network 140 to authenticate the userusing one or more of the techniques described above. Thus, in thisinstance, the voice/sound based authentication can occur independentlyof the standard authentication process that is currently in use or anyother type of authentication, e.g., communication device ID basedauthentication.

The supplemental signal can be transmitted to and from the communicationdevice 110 in various ways. In one embodiment, the server computer 300can transmit the supplemental signal as an audio signal to communicationdevice 110 and the communication device 110 can receive the supplementalsignal via its audio channel. The communication device 110 can outputthe supplemental signal, via its speaker, and the microphone can capturethe supplemental signal. The communication device 110 can then digitizethe supplemental signal and transmit that over another channel (e.g.,data channel) to the server computer 300. In another embodiment, paymentprocessing network 140 can transmit the supplemental signal in a digitalformat to the communication device 110 and the communication device canconvert and output the supplemental signal in an analog format. Themicrophone of the communication device 110 can capture the analogsupplemental signal, convert it back to a digital format, and send it tothe payment processing network 140 via an access device (not shown)along with the user's voice data. The payment processing network 140 canthen compare the digital version of the supplemental signal that it sentto the communication device 110 with the digital version of thesupplemental signal received from the communication device 110 toperform the authentication, as described above.

In some embodiments, the server computer 300 may be part of issuer 150and the comparison and authentication can be performed by issuer 150.

Accordingly, the voice authentication system 400 provides an added layerof assurance that can be selectively drawn upon, e.g. for transactionsthat are deemed risky and which may otherwise have a high likelihood ofdeclines (“Step-up authentication”). The added layer of assurancederives from the supplemental signal playing concurrently orsubstantially simultaneous to capturing the user voice data. Since thesupplemental signal is unique for each transaction, the likelihood of afraudster capturing the supplemental signal and using it in a replayattack to fraudulently authenticate is drastically reduced. Inparticular such an option may be attractive if the communication device110 is used to initiate the transaction, since in this case thecommunication device 110 may not play the role of a secondary channel ordevice if it is already the payment instrument for that transaction. Ininstances where the supplemental signal is an inaudible sound, thefraudster may not even be aware of the supplemental signal that isconcurrently played with capturing the user voice data by thecommunication device 110.

Since the supplemental signal is unique to each payment transaction, insome embodiments, the supplemental signal may be used as token for theuser's 401 primary account number (PAN). That is, the supplementalsignal may serve as a unique token for the particular transaction andthe token may be used in the standard payment authorization process.

In some embodiments, the payment processing network 140, a paymentprocessor, or a third party may provide a channel (not shown in FIG. 1)through which to prompt the user 401 during a payment transaction. Theprompt may be displayed on the communication device 110 and may requestthat the user 401 to speak certain words or prompts. As described above,the recording of the user 401 speaking those words or prompts may thenbe transmitted to the server computer 300, which may perform the voiceauthentication.

The channel through which the user 401 is prompted may utilize anapplication on his/her communication device 110 (e.g., mobile phone), anInternet session using a browser or app on their phone or PC, or someother mechanism that allows the prompted words or prompts to bedisplayed or played and allows the user's 401 voice to be recorded (viaa microphone on communication device 110) and transmitted to the servercomputer 300 (via the payment processing network 140). In someembodiments, the prompts may be displayed visually on communicationdevice 110. In some embodiments, the prompts may be played audibly oncommunication device 110.

FIG. 5 is a flow diagram illustrating a method for authenticating a user401 for a transaction via an access device 120, according to anembodiment of the present invention. FIG. 5 is similar to FIG. 4 exceptthat the voice authentication system 500 includes an access device 120in place of the communication device 110 (FIG. 4).

The voice authentication system 500 includes an access device 120 (e.g.,POS terminal), acquirer 130, payment processing network 140, issuer 150,and a server computer 300. It can be appreciated that while the servercomputer 300 is shown external to the payment processing network 140, insome embodiments the server computer 300 may reside within the paymentprocessing network 140. The voice authentication system 500 provides amechanism to authenticate a user 401 using voice authenticationtechniques, specifically by playing back and recording a supplementalsignal substantially simultaneous to the user performing voiceauthentication.

The user 401 may use the access device 120 (e.g., a POS terminal at amerchant site) to initiate a financial transaction. In some embodiments,the user may use a communication device that is equipped with near-fieldcommunication technology (NFC) to interact with the access device 120and initiate the financial transaction. The access device 120 (FIG. 1)may be located at a merchant site and the user may communicativelycouple the communication device to the access device 120 (FIG. 1) toinitiate the financial transaction. In other embodiments, the user mayswipe his/her payment card at the access device 120 to initiate thetransaction. In one embodiment, the financial transaction may beinitiated in order to pay for an item or service, from the merchant,purchased by the user 401. Once the financial transaction is initiated,the user may be asked to provide a voice sample by speaking a certainword or phrase into the microphone of the access device 120 (step 402).For example, the voice authentication system 500 may prompt the user401, via a display on the access device 120, to speak a random wordstring, such as “The quick brown fox jumps over the lazy dog”. In someembodiments of the invention, the prompted word string is less thanabout 7 words in length, and preferably five or less words in length. Bykeeping the lengths of the prompts short, users are less frustrated andare more likely to use the systems and methods according to embodimentsof the invention.

In another example, the voice authentication system 500 may prompt theuser 401 with a prompt having a variable or random element. In someembodiments, the prompt may have both a random element and a fixedelement, with the fixed element being greater than the random element.In some embodiments, the fixed element can have a length of 7, 5, 3, or2 words or less, while the random element may have a length of 5, 3, or2 words or less. For example, embodiments of the invention may provide afirst word string prompt such as “Please repeat the word TREE LEAVES”and subsequently a second prompt such as “Please repeat the words CATAND DOG”. The phrase “Please repeat the words” may be a fixed portion ofthe prompt, while words “TREE LEAVES” and “CAT AND DOG” may be random orvariable portions of the prompt.

In some embodiments, the request to the user 401 to speak the prompt maybe displayed on the access device 120. The voice authentication system500 is described in further detail below.

Prior to the user 401 actually speaking the prompted word string, theaccess device 120 may receive a supplemental signal from the servercomputer 300 (step 404). In some embodiments, the supplemental signalmay include an audible or inaudible sound. As described above, thesupplemental signal may be retrieved from the supplemental signaldatabase 340 and transmitted to the access device 120 via thesupplemental signal transmission module 364 (FIG. 3) (step 403). In someembodiments, the supplemental signal transmission module 364 (FIG. 3)may alter and/or manipulate the supplemental signal prior totransmitting the supplemental signal to the access device 120. Forexample, the supplemental signal transmission module 364 (FIG. 3) maychange a frequency, pitch, speed, etc. of the supplemental signal priorto transmission. It can be appreciated that the supplemental signaltransmitted by the access device 120 may be unique to the particulartransaction. Further, by altering and/or manipulating the supplementalsignal prior to transmitting, it may be ensured that the samesupplemental signal is never used for more than one transaction.

The user may then speak the prompted word string into the microphone ofthe access device 120 (step 406). For example, the user may speak “Thequick brown fox jumps over the lazy dog”, similar to the example of theprompted word string provided above. Concurrent or substantiallysimultaneous to the user 401 speaking the prompted word string, theaccess device 120 can emit the supplemental signal from the speaker ofthe access device 120 (step 408). For example, the access device 120 mayemit an inaudible sound that cannot be heard by a human, such as soundhaving frequency of less the 20 Hz or more than 20,000 Hz. Themicrophone of the access device 120 can concurrently or substantiallysimultaneous capture this inaudible sound along with the spoken promptedword string by the user 401 (step 410). That is, while the speaker ofthe access device 120 is playing the supplemental signal (e.g., theinaudible sound), the microphone of the access device 120 mayconcurrently or substantially simultaneous capture both the supplementalsignal and the speaker's attempt to reproduce to the prompted wordstring.

The access device 120 may send the captured user voice data andsupplemental signal (collectively referred to as the audio segment)along with any transaction information to an acquirer 130 (step 412).Acquirer 130 may then forward the audio segment and transactioninformation to the payment processing network 140 (step 414). In someembodiments, the audio segment received by payment processing network140 from access device 120 may be included in the payment authorizationrequest message described herein.

The payment processing network 140 may then authenticate the user 401based on the received audio segment. The payment processing network 140may establish a communication with the server computer 300 in order toauthenticate the user 401 (step 416). In some embodiments, the servercomputer 300 may reside within the payment processing network 140. In aparticular embodiment, the payment processing network 140 may consultwith the server computer 300 using the received audio segment (includingthe user voice data and the supplemental signal). The server computer300 may determine whether the supplemental signal initially sent by theserver computer 300 to the access device 120 (in step 404) matches thesupplemental signal received from the payment processing network 140(step 420). Additionally, the server computer 300 may determine whetherthe user voice data received in the audio segment matches a voiceprofile of the user 401 stored in the user voice profile database 350(step 418). The server computer 300 may parse and analyze the user voicedata in order to make this determination. If the server computer 300determines a match of the user voice data, the user voice data may bestored in the user voice profiles database 350 and it may be updatedwith the currently received user voice data and used for futureevaluation of voice data from the user 401.

In some embodiments, the server computer 300 provides a pass/fail scorethat can be used as what is known as a “Cardholder VerificationMechanism” (CVM) in certain standards, alongside other CVMs such as aPersonal Identification Number (PIN) or signature. In this mode, theaccess device 120 (e.g., at merchant) will be informed as to whethervoice authentication has passed or failed, and can make a decisionwhether to proceed with the transaction. The determination of thepass/fail score may be made by the server computer 300 upon receivingthe voice data and supplemental signal in the audio segment. In someembodiments, the pass/fail response may be based on whether the user 401voice data is consistent with previous user voice data stored in theuser voice profile database 350 and whether the supplemental signalreceived in the audio segment matches the supplemental signal originallysent by the server computer 300 to the access device 120. If the aboveconditions are true, the response may indicate that authentication haspassed.

In some embodiments, the server computer 300 provides a score (typicallybetween 0 and 100), where the score expresses a degree of confidencethat a user 401 is the genuine user. This score can be passed on toother parts of a risk scoring mechanism (one run by payment processor,by the issuer, or by a third party), such that the score, along withother risk parameters, contributes to the overall decision of approvingor declining the transaction. In some embodiments, the match score isbased on how closely the captured user voice data matches to previouslycaptured user voice data stored in the user voice profile database 350.That is, how closely the current voice data matches to previouslyobtained voice data from the user. This may be determined by analyzingfeatures of the voice sample such as, but not limited to, tone, pitch,etc. Additionally, the score may also express a degree of confidencethat the supplemental signal received in the audio segment matches thesupplemental signal originally sent by the server computer 300 to theaccess device 120.

In another embodiment, access device 120 may have a repository ofsupplemental signals stored in its memory. When access device 120 isregistered with the payment processing network 140 and/or issuer 150, itmay be provided with numerous supplemental signals that can be stored byaccess device 120. These supplemental signals can be associated with theuser 401 account. In this instance, when the user 401 speaks theprompted word string, access device 120 can select one of the storedsupplemental signals and output it via its speaker 250 (FIG. 2) to becaptured by the microphone along with the user's 401 spoken word orphrase. In this instance, the voice authentication can occur bycomparing the captured supplemental signal with the supplemental signalpre-associated with the user account and checking whether the capturedsupplemental signal matches one of the supplemental signals associatedwith the user account.

Once the payment processing network 140 authenticates the user based onthe received audio segment and other financial information, it cancommunicate with issuer 150 to complete the transaction (step 422). Forexample, the authorization request message may be reformatted withoutthe audio segment data and transmitted to the issuer 150 for approval.The issuer may respond to the authorization request message with anauthorization response message indicating approval or disapproval of thetransaction. This authorization response message may be transmitted backto the access device 120 via the payment processing network 140 and theacquirer 130. At the end of the day or at any other suitable time, asettlement and clearing process can occur.

In another embodiment, access device 120 may directly communicate withserver computer 300 to provide both the user voice data and thesupplemental signal. In this instance, server computer 300 may work inconjunction with payment processing network 140 to authenticate the userusing one or more of the techniques described above. Thus, in thisinstance, the voice/sound based authentication can occur independentlyof the standard authentication process that is currently in use or anyother type of authentication, e.g., access device ID basedauthentication.

The supplemental signal can be transmitted to and from the access device120 in various ways. In one embodiment, the server computer 300 cantransmit the supplemental signal as an audio signal to access device 120and the access device 120 can receive the supplemental signal via itsaudio channel. The access device 120 can output the supplemental signal,via its speaker, and the microphone can capture the supplemental signal.The access device 120 can then digitize the supplemental signal andtransmit that over another channel (e.g., data channel) to the servercomputer 300. In another embodiment, payment processing network 140 cantransmit the supplemental signal in a digital format to the accessdevice 120 and the access device can convert and output the supplementalsignal in an analog format. The microphone of the access device 120 cancapture the analog supplemental signal, convert it back to a digitalformat, and send it to the payment processing network 140 along with theuser's voice data. The payment processing network 140 can then comparethe digital version of the supplemental signal that it sent to theaccess device 120 with the digital version of the supplemental signalreceived from the access device 120 to perform the authentication, asdescribed above.

In some embodiments, the server computer 300 may be part of issuer 150and the comparison and authentication can be performed by issuer 150.

Accordingly, the voice authentication system 500 provides an added layerof assurance that can be selectively drawn upon, e.g. for transactionsthat are deemed risky and which may otherwise have a high likelihood ofdeclines (“Step-up authentication”). The added layer of assurancederives from the supplemental signal playing concurrently orsubstantially simultaneous to capturing the user voice data. Since thesupplemental signal is unique for each transaction, the likelihood of afraudster capturing the supplemental signal and using it in a replayattack to fraudulently authenticate is drastically reduced. Inparticular such an option may be attractive if the access device 120 isused to initiate the transaction, since in this case the access device120 may not play the role of a secondary channel or device if it isalready the payment instrument for that transaction. In instances wherethe supplemental signal is an inaudible sound, the fraudster may noteven be aware of the supplemental signal that is concurrently playedwith capturing the user voice data by the access device 120.

Since the supplemental signal is unique to each payment transaction, insome embodiments, the supplemental signal may be used as token for theuser's 401 primary account number (PAN). That is, the supplementalsignal may serve as a unique token for the particular transaction andthe token may be used in the standard payment authorization process.

In some embodiments, the payment processing network 140, a paymentprocessor, or a third party may provide a channel (not shown in FIG. 1)through which to prompt the user 401 during a payment transaction. Theprompt may be displayed on the access device 120 and may request thatthe user 401 to speak certain words or prompts. As described above, therecording of the user 401 speaking those words or prompts may then betransmitted to the server computer 300, which may perform the voiceauthentication.

The channel through which the user 401 is prompted may utilize anapplication on his/her access device 120 (e.g., mobile phone), anInternet session using a browser or app on their phone or PC, or someother mechanism that allows the prompted words or prompts to bedisplayed or played and allows the user's 401 voice to be recorded (viaa microphone on access device 120) and transmitted to the servercomputer 300 (via the payment processing network 140). In someembodiments, the prompts may be displayed visually on access device 120.In some embodiments, the prompts may be played audibly on access device120.

FIG. 6A shows a screenshot of initial voice authentication enrollment ona communication device 110, according to an embodiment of the presentinvention. The screenshot shows an example of a prompt, presented on thedisplay 230, for user 401 (FIG. 4) enrollment in the voiceauthentication system 400 (FIG. 4) that may be displayed on thecommunication device 110 (FIG. 1). FIG. 6A illustrates the first step ininitial enrollment with the voice authentication system 400. Duringenrollment, no match scoring is calculated. Instead, the captured voicedata of the user 401 (FIG. 4) may be used to build the user's voiceprofile eventually stored in the user voice profile database 350 (FIG.4). These voice recordings may be submitted to the server computer 300(FIG. 4) as enrollment recordings, and the server computer 300 (FIG. 4)may create a profile for the user 401 (FIG. 4) and store this profilefor future reference, the profile being linked to the user 401 (FIG. 4).In some embodiments, the profile may be stored within the user voiceprofile database 350 (FIG. 4) within the server computer 300 (FIG. 4).

During initial enrollment, the prompt may ask the user for their gender,age, and/or native language. This information about the user 401 (FIG.4) may be stored in the user voice profile database 350 (FIG. 4) withinthe server computer 300 (FIG. 4) and used for parsing and analyzingsubsequently received user voice data.

FIG. 6B shows a screenshot of capturing voice data for voiceauthentication enrollment, according to an embodiment of the presentinvention. After the user 401 (FIG. 4) enters his/her user details, asdescribed above, the voice authentication system 400 (FIG. 4) mayfurther request that the user 401 (FIG. 4) speak a specified word string610. For example, the word string 610 may request that the user 401(FIG. 4) speak the words, “the story behind each one of them.” It can beappreciated that the enrollment word string 610 may be different foreach user 401 (FIG. 4) or different for each enrollment word string. Itcan further be appreciated that the enrollment word string 610 maydiffer from the voice authentication word string (see below). Further,the user 401 (FIG. 4) may be required to speak multiple word stringsprior to completing enrollment with the voice authentication system 400(FIG. 4).

In some embodiments, the user 401 (FIG. 4) may be able to select whetherany type of background noises, such as other individuals speaking,music, etc., exist at the time the user 401 (FIG. 4) is speaking thespecific word string 610 for enrollment. The user 401 (FIG. 4) may speakthe specific word string 610 and his/her voice may be captured bymicrophone 240. If the user 401 (FIG. 4) indicated that any backgroundnoises were present, the voice authentication system 400 may try tofilter out the background noises prior to transmitting the audio segment(including the supplemental signal and user voice data) to the servercomputer 300 (FIG. 4).

FIG. 6C shows a screenshot of progressive feedback while capturing uservoice data for voice authentication enrollment, according to anembodiment of the present invention. As described above, the voiceauthentication system 400 (FIG. 4) may generate a word string 610 for auser 401 (FIG. 4) to repeat for purposes of enrollment with the system.The word string 610 may be generated by the server computer 300 (FIG.4). In some embodiments, a progressive feedback indicator 620 may bepresented on display 230 of communication device 110. The progressivefeedback indicator 620 may indicate to the user 401 (FIG. 4) his/herprogress in completing repetition of the word string 610 and may alsoindicate specifics of the analog voice signal being captured by themicrophone 240.

FIG. 7A shows a screenshot of voice authentication on a communicationdevice 110 using a first word string 710, according to an embodiment ofthe present invention. After a sufficient number of successfulrecordings of word strings for enrollment have been made in order forthe user's 401 (FIG. 4) voice profile to be created, subsequent wordstrings 710 may be used for authentication and each recording of theuser 401 (FIG. 4) may be submitted to the server computer 300 (FIG. 4),which may respond with either a match/no match (pass/fail) response orwith a match score, as described above.

The prompts for enrollment and for authentication may be quite short inlength, in order to make for a positive user 401 (FIG. 4) experience,with each prompt 710 consisting of only a few words. For this reason itis anticipated that the first few recordings may be used for enrollment(building the profile of the user's voice) before any authentication, asshown in FIG. 7A, may take place. The user 401 (FIG. 4) may not have tobe made aware of any difference between enrollment and authentication.The user 401 (FIG. 4) may simply be prompted in connection with apayment transaction and may not necessarily have knowledge as to whetherthe voice recording may have played any role in the approval or declineof the transaction. For example, as illustrated in FIG. 6B, the user 401(FIG. 4) may be prompted to speak “the story behind each one of them,”for enrollment. Similarly, as illustrated in FIG. 7A, the user 401 (FIG.4) may be prompted to speak “the soft moss swimming,” forauthentication.

In order to prevent prior recordings from being useful for anysubsequent authentication (thus, to prevent replay attacks by afraudster), a supplemental signal may be played by the communicationdevice 110 concurrent to the voice recording. As shown in FIG. 7A, anotification may be presented to the user that the supplemental signalis playing. This may be beneficial when the supplemental signal is aninaudible sound, as it proves the user with reassurance that the addedsecurity measure is in fact functioning. As mentioned above, thesupplemental signal may be unique for each voice authentication attemptand each payment transaction.

The communication device 110 may also function as a quality control gatekeeper that performs speech recognition to recognize recording qualitythat is so poor (due to noise, low speaker volume, high speaker volume,etc.) that the user 401 (FIG. 4) may be prompted to try again, move to aquieter environment etc., prior to submitting the recording to theserver computer 300 (FIG. 4). This may improve the user 401 (FIG. 4)experience by lowering transaction time for problematic recordings.

The word strings may be designed to enable a convenient user 401 (FIG.4) experience while providing the best feasible security. The wordstrings may consist of easy to pronounce words combined in such a waythat each word string may include sufficient phonemes for a highreliability matching to take place. For global solutions, the wordstrings may be provided in several languages and in any language thecardholder wishes.

For voice authentication matching, the quality of the profile of auser's 401 (FIG. 4) voice may improve as more voice data is collectedfrom the user 401 (FIG. 4). For reasons of customer convenience someembodiments keep each captured voice data short, and thus it may requirea (small) number of recordings for the profile to reach an acceptablelevel of quality. It may be possible to subsequently improve the profileby adding more voice data. Since the user 401 (FIG. 4) may be performingauthentications using voice recordings, these recordings may be added tothe user's 401 (FIG. 4) voice profile, thereby improving it. Oftentimes, however, where new voice data becomes available after an initialenrollment, it is difficult to determine whether the voice data wasthose of the legitimate user 401 (FIG. 4) or of a fraudster. Therefore,adding voice prints may not be possible without risking to pollute theuser's voice profile and actually make it closer to that of thefraudster's voice. In certain embodiments, however, it is possible toplay a unique supplemental signal and capture the supplemental signalconcurrent to capturing the user voice data, to prevent a fraudster fromperforming a replay attack. This same technique and concept may beapplied to the initial enrollment as well.

In some embodiments, the user 401 (FIG. 4) may be presented with aprogressive feedback indicator 520 indicating progress towards capturingthe user's voice for authentication.

FIG. 7B shows a screenshot of speaker verification on a communicationdevice using a second word string 720, according to an embodiment of thepresent invention. FIG. 7B illustrates a scenario of a user 401 (FIG. 4)attempting to authenticate in for a payment transaction different thanthat in FIG. 7A. As such, the second word string 720 is different fromfirst prompt 710 in FIG. 7A. As described above, each authenticationattempt by the user 401 (FIG. 4) may require the user to repeat adifferent word string. The word string 720 of FIG. 7B requests the userto speak “The flat upright sparkle shines.” Also, the supplementalsignal played back in FIG. 7B is different than the one played back inFIG. 7A, as the supplemental signal is unique to each paymenttransaction. In some embodiments, the user 401 (FIG. 4) may be presentedwith a progressive feedback indicator 620 indicating progress towardscapturing the user's voice for authentication.

FIG. 8 illustrates contents of a supplemental signal database 340,according to an embodiment of the present invention. In someembodiments, the supplemental signal database 340 may reside within theserver computer 300 (FIG. 4). The supplemental signal database 340 isconfigured to store a plurality of supplemental signals to betransmitted during user authentication to the communication device 110by the server computer 300. The supplemental signal database 340 mayinclude attributes such as, but not limited to, the supplemental signalnumber, the supplemental signal data, information about whether thesupplemental signal is audible or inaudible, and a frequency of thesupplemental signal.

FIG. 8 shows data relating to nine different supplemental signals. Eachof the nine supplemental signal data sets includes the attributeinformation mentioned above.

The signal number attribute of the supplemental signal database 340indicates the number of the supplemental signal stored in the database340. Each supplemental signal has a unique number assigned to it foridentification purposes. It can be appreciated that while only ninesupplemental signals are shown stored in the database 340, the database340 may include millions or tens of millions of supplemental signals.

The time attribute of the user fraud profile 450 indicates the time ofday on the date at which the user initiated the particular paymenttransaction.

The signal data attribute of the supplemental signal database 340indicates the actual digital binary data that makes up the supplementalsignal. As mentioned above, each supplemental signal is unique for eachpayment transaction and thus the digital binary data will be unique foreach transaction. Even if one bit in the binary data for thesupplemental signal is different than another supplemental signal, thatsupplemental signal is unique. The digital binary data may be convertedto an analog audible or inaudible sound by the communication device 110upon receiving the supplemental signal from the server computer 300.

The audible attribute of the supplemental signal database 340 indicateswhether the supplemental signal is an audible or inaudible supplementalsignal. In some embodiments, the frequency of the supplemental signalmay determine whether the supplemental signal is audible or inaudible tothe human user 401 (FIG. 4).

The frequency attribute of the supplemental signal database 340indicates the frequency in Hertz of the supplemental signal.Supplemental signals played outside of the 20 Hz to 20,000 Hz range maybe inaudible to the human user 401.

During user voice authentication, the server computer 300 may access aunique supplemental signal (one that hasn't been used for authenticationbefore) from the supplemental signal database 340 and transmit it to thecommunication device 110. As described above, the communication device110 may playback the received supplemental signal while concurrently orsubstantially simultaneous capture the user voice data and thesupplemental signal being played back. The communication device 110 maytransmit the user voice data and the supplemental signal (collectivelyas the audio segment) to the payment processor network who may verifywith the server computer 300 that the supplemental signal captured bythe communication device 110 does in fact match the supplemental signaloriginally pulled from the supplemental signal database 240 andtransmitted to the communication device 110 by the server computer 300.The server computer 300 may access the supplemental signal database 340again when making this determination.

II. Exemplary Methods

FIG. 9 is a flow diagram illustrating a method 900 for authenticating auser for a transaction, according to an embodiment of the presentinvention. The method 900 is performed by processing logic that maycomprise hardware (circuitry, dedicated logic, etc.), software (such asis run on a general purpose computing system or a dedicated machine),firmware (embedded software), or any combination thereof. In certainembodiments, the method 900 is performed by the server computer 300 orthe payment processing network 140 of FIG. 4.

The method 900 may begin when a user initiates a financial transactionusing his or her communication device. Alternatively, the user mayinitiate the financial transaction at an access device. Upon the userinitiating the financial transaction, the server computer may receive anindication that the financial transaction has been initiated by the user(Step 902).

After the server computer receives indication of a financialtransaction, the server computer may send a message to the communicationdevice instructing the user to speak a word string associated withhis/her account (Step 904). The word string may random and unique to theparticular financial transaction. For example, the word string may be“All dogs like to eat.” The user may attempt to reproduce the wordstring with his or her voice in order to perform voice authenticationwith the system.

After the server computers sends a message to the communication deviceinstructing the user to speak the word string, the server computer maysend a supplemental signal to the communication device (Step 906). Thesupplemental signal may be received from a supplemental signal databaseresiding within the server computer. The supplemental signal may beoutputted by the communication device while the user attempts to performthe voice authentication by speaking the prompted word string. Thecommunication device may then capture the user's rendition of thephrase/code word as well as the supplemental signal. In someembodiments, the supplemental signal may be an inaudible sound or anaudible sound.

After the server computer sends the supplemental signal to thecommunication device, the server computer may receive the user-spokenreproduction of the word string and the supplemental signal from thecommunication device (Step 908). The user-spoken reproduction of theword string and the supplemental signal may be received in the sameaudio segment, as they may have been captured simultaneously by thecommunication device. The supplemental signal may have been captured atthe same time while it had been outputted by the communication device.

After the server computer receives the audio segment including theuser-spoken reproduction of the word string and the supplemental signal,the server computer may then compare the received supplemental signalwith the initially sent supplemental signal sent to the communicationdevice by the server computer (Step 910). The server computer may accessthe supplemental signal database to determine whether the receivedsupplemental signal matches the initially sent supplemental signal. Ifthe sent supplemental signal and the received supplemental signal do notmatch, the server computer may indicate to the payment processingnetwork that the user is not authenticated (Step 916). In turn, thepayment processing network may deny the payment transaction (Step 918).

If the sent supplemental signal and the received supplemental signalmatch, the server computer may then compare the received voice data witha user voice profile for the user to determine whether the receivedvoice data is consistent with previous voice data stored for the user(Step 911). The server computer may access entries within a user voiceprofile database associated with the user to make this determination. Ifthe received voice data is determined to be consistent with the uservoice profile, the server computer may authenticate the user (Step 912)and communicate the authentication result to the payment processingnetwork. In turn, the payment processing network may approve and processthe transaction (Step 914). If the two the received voice data isdetermined to not be consistent with the user voice profile, the servercomputer may indicate to the payment processing network that the user isnot authenticated (Step 920). In turn, the payment processing network,may conclude that the user cannot be authenticated and may deny thetransaction (Step 922). In some embodiments, the server computer mayrequest the user to re-speak the word string and send anothersupplemental signal to start the process over.

In another embodiment, the server computer may send the communicationdevice a string of related or unrelated alphabets, numbers, words, orcombination thereof. Thereafter, the server computer may highlight orvisually emphasize certain alphabets, words, or numbers from the sentinformation and ask the user to repeat only the visually emphasizedcharacters. For example, the payment processing network may send thefollowing character string to the communication device “h3e4lu89l2j6o”.Thereafter, the user may be asked to only repeat back the visuallyemphasized characters. In this example, the characters “h”, “e”, “l”,“l”, “o” are visually emphasized (bolded), so the user will repeat thesecharacters back to the communication device. The communication devicewill then capture this audio along with the inaudible sound describedabove and send that information to the server computer forauthentication. The server computer can verify whether the user hascorrectly repeated the visually highlighted characters and also comparethe received inaudible sound with a previously sent inaudible sound toperform user authentication according to any of the techniques describedabove.

In an embodiment, the user may be asked to repeat certainwords/characters displayed on the communication device at a particularpace/speed. The pace/speed can be determined by the server computer. Forexample, certain characters or words may be displayed to the user onhis/her communication device based on a certain tempo and the user maybe asked to repeat those characters/words at the same or substantiallysimilar tempo. If the input received from the user is within a margin oferror for the desired tempo, the user can be authenticated. In otherembodiments, the user may be asked to enter a personal identificationnumber (PIN) in addition to the voice data and the supplemental signaland all of these may be transmitted to the server computer or paymentprocessing network for use in authenticating the user.

In still other embodiments, the user may communicatively couple acontactless card with his communication device to transfer informationfrom the contactless card to the communication device. This can be donein addition to capturing the supplemental signal.

It should be appreciated that the specific steps illustrated in FIG. 9provide a particular method for authenticating a user for a transactionat a communication device using speaker verification, according to anembodiment of the present invention. Other sequences of steps may alsobe performed according to alternative embodiments. For example,alternative embodiments of the present invention may perform the stepsoutlined above in a different order. Moreover, the individual stepsillustrated in FIG. 9 may include multiple sub-steps that may beperformed in various sequences as appropriate to the individual step.Furthermore, additional steps may be added or removed depending on theparticular applications. One of ordinary skill in the art wouldrecognize and appreciate many variations, modifications, andalternatives of the method 900.

I. Exemplary Systems

FIG. 10 is a diagram of a computer apparatus 1000, according to anexample embodiment. The various participants and elements in thepreviously described system diagram (e.g., the communication device,payment processing network, acquiring bank, issuing bank, etc., in FIG.1 or the server computer in FIG. 3) may use any suitable number ofsubsystems in the computer apparatus to facilitate the methods and/orfunctions described herein. Examples of such subsystems or componentsare shown in FIG. 10. The subsystems shown in FIG. 10 are interconnectedvia a system bus 1005. Additional subsystems such as a printer 1040,keyboard 1070, fixed disk 1080 (or other memory comprisingcomputer-readable media), monitor 1055, which is coupled to displayadapter 1050, and others are shown. Peripherals and input/output (I/O)devices (not shown), which couple to I/O controller 1010, can beconnected to the computer system by any number of means known in theart, such as serial port 1060. For example, serial port 1060 or externalinterface 1090 can be used to connect the computer apparatus to a widearea network such as the Internet, a mouse input device, or a scanner.Alternatively, peripherals can be connected wirelessly (e.g., IR,Bluetooth, etc.). The interconnection via system bus allows the centralprocessor 1030 to communicate with each subsystem and to control theexecution of instructions from system memory 1020 or the fixed disk1080, as well as the exchange of information between subsystems. Thesystem memory 1020 and/or the fixed disk 1080 (e.g., hard disk, solidstate drive, etc.) may embody a computer-readable medium.

The software components or functions described in this application maybe implemented as software code to be executed by one or more processorsusing any suitable computer language such as, for example, Java, C++ orPerl using, for example, conventional or object-oriented techniques. Thesoftware code may be stored as a series of instructions, or commands ona computer-readable medium, such as a random access memory (RAM), aread-only memory (ROM), a magnetic medium such as a hard-drive or afloppy disk, or an optical medium such as a CD-ROM. Any suchcomputer-readable medium may also reside on or within a singlecomputational apparatus, and may be present on or within differentcomputational apparatuses within a system or network.

The present invention can be implemented in the form of control logic insoftware or hardware or a combination of both. The control logic may bestored in an information storage medium as a plurality of instructionsadapted to direct an information processing device to perform a set ofsteps disclosed in embodiments of the present invention. Based on thedisclosure and teachings provided herein, a person of ordinary skill inthe art will appreciate other ways and/or methods to implement thepresent invention.

In embodiments, any of the entities described herein may be embodied bya computer that performs any or all of the functions and stepsdisclosed.

Any recitation of “a”, “an” or “the” is intended to mean “one or more”unless specifically indicated to the contrary.

One or more embodiments of the invention may be combined with one ormore other embodiments of the invention without departing from thespirit and scope of the invention.

The above description is illustrative and is not restrictive. Manyvariations of the invention will become apparent to those skilled in theart upon review of the disclosure. The scope of the invention should,therefore, be determined not with reference to the above description,but instead should be determined with reference to the pending claimsalong with their full scope or equivalents.

What is claimed is:
 1. A method for authenticating a user for a transaction, comprising: receiving, by a device, a user request to initiate the transaction; transmitting, by the device and to a server computer, a notification of the user request to the initiate the transaction; in response to transmitting the notification to the server computer, receiving, by the device and from the server computer, a supplemental signal comprising an inaudible sound and an element unique to the transaction; in response to receiving the supplemental signal, displaying, by the device, a word string to be repeated by the user, wherein the word string comprises a random element and a fixed element; playing, by the device, the received supplemental signal; concurrently recording, by the device, (a) an audio segment originating from the user while the user attempts to vocally reproduce the displayed word string and (b) the received supplemental signal; and sending, by the device, the concurrently recorded audio segment and the received supplemental signal, to the server computer, wherein the server computer authenticates the user based at least in part on (a) the sent audio segment matching with a voice profile associated with the user and (b) the supplemental signal sent by the device matching the supplemental signal originally received by the device.
 2. The method of claim 1 wherein the supplemental signal is played by a point-of-sale (POS) device while authenticating the user for the transaction.
 3. The method of claim 1, wherein the inaudible sound comprises frequency components inaudible to a human being.
 4. The method of claim 1, wherein the element unique to the transaction comprises at least one of a unique frequency, a unique pitch, or a unique speed of the supplemental signal.
 5. A device, comprising: a processor; and a non-transitory computer-readable storage medium, comprising code executable by the processor for implementing a method for authenticating a user for a transaction, the method comprising: receiving, by the device, a user request to initiate the transaction; transmitting, by the device and to a server computer, a notification of the user request to the initiate the transaction; in response to transmitting the notification to the server computer, receiving, by the device and from the server computer, a supplemental signal comprising an inaudible sound and an element unique to the transaction; in response to receiving the supplemental signal, displaying, by the device, a word string to be repeated by the user, wherein the word string comprises a random element and a fixed element; playing, by the device, the received supplemental signal; concurrently recording, (a) by the device, an audio segment originating from the user while the user attempts to vocally reproduce the displayed word string and (b) the received supplemental signal; and sending, by the device, the concurrently recorded audio segment and the received supplemental signal, to the server computer, wherein the server computer authenticates the user based at least in part on (a) the sent audio segment matching with a voice profile associated with the user and (b) the supplemental signal sent by the device matching the supplemental signal originally received by the device.
 6. The device of claim 5 wherein the supplemental signal is played by a point-of-sale (POS) device while authenticating the user for the transaction.
 7. The method of claim 5, wherein the inaudible sound comprises frequency components inaudible to a human being.
 8. The method of claim 5, wherein the element unique to the transaction comprises at least one of a unique frequency, a unique pitch, or a unique speed of the supplemental signal.
 9. A method for authenticating a user for a transaction, comprising: receiving, from a device and by a server computer, a request to initiate the transaction; transmitting, by the server computer and to the device, a supplemental signal comprising an inaudible sound and an element unique to the transaction; receiving, by the server computer, authorization request comprising an (a) audio segment and (b) the transmitted supplemental signal, wherein the audio segment and the transmitted supplemental signal were concurrently recorded by the device while the device plays the transmitted supplemental signal and while the user attempts to vocally reproduce a word string displayed by the device, and wherein the word string comprises a random element and a fixed element; verifying, by the server computer, that (a) the supplemental signal received in the authorization request matches the supplemental signal originally transmitted to the device and (b) that the received audio segment matches with a voice profile associated with the user; and authenticating, by the server computer, the user for the transaction based at least in part on the verifying.
 10. The method of claim 9 further comprising generating, by the server computer, the supplemental signal.
 11. The method of claim 9 further comprising, receiving, a payment authorization request associated with the transaction, the payment authorization request including the concurrently recorded audio segment and supplemental signal.
 12. The method of claim 9, wherein the inaudible sound comprises frequency components inaudible to a human being.
 13. The method of claim 9, wherein the element unique to the transaction comprises at least one of a unique frequency, a unique pitch, or a unique speed of the supplemental signal.
 14. A server computer, comprising: a processor; and a non-transitory computer-readable storage medium, comprising code executable by the processor for implementing a method for authenticating a user for a transaction, the method comprising: receiving, from a device and by the server computer, a request to initiate a transaction; transmitting, by the server computer and to the device, a supplemental signal comprising an inaudible sound and an element unique to the transaction; receiving, by the server computer, an authorization request comprising an (a) audio segment and (b) the transmitted supplemental signal, wherein the audio segment and the transmitted supplemental signal were concurrently recorded by the device while the device plays the transmitted supplemental signal and while the user attempts to vocally reproduce a word string displayed by the device, and wherein the word string comprises a random element and a fixed element; and verifying, by the server computer, that (a) the supplemental signal received in the authorization request matches the supplemental signal originally transmitted to the device and (b) that the received audio segment matches with a voice profile associated with the user; and authenticating, by the server computer, the user for the transaction based at least in part on the verifying.
 15. The server computer of claim 14 wherein the method further comprises generating, by the server computer, the supplemental signal.
 16. The server computer of claim 14 wherein the method further comprises, receiving, a payment authorization request associated with the transaction, the payment authorization request including the concurrently recorded audio segment and supplemental signal.
 17. The method of claim 14, wherein the inaudible sound comprises frequency components inaudible to a human being.
 18. The method of claim 14, wherein the element unique to the transaction comprises at least one of a unique frequency, a unique pitch, or a unique speed of the supplemental signal. 